Skip to content

Eliminate race conditions and remove DATAROOT last in cleanup#2893

Merged
WalterKolczynski-NOAA merged 3 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:fix/cleanup
Sep 6, 2024
Merged

Eliminate race conditions and remove DATAROOT last in cleanup#2893
WalterKolczynski-NOAA merged 3 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:fix/cleanup

Conversation

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Sep 5, 2024

Description

This changes the order of the cleanup job so that the working directory is deleted at the end. It also adds the -ignore_readdir_race flag to find to prevent errors if a file was deleted after the list of files was collected. This can happen if two consecutive cycles run the cleanup job at the same time.

Resolves #2880

Type of change

  • Bug fix (fixes something broken)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

A 5-cycle test on WCOSS2

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes

@WalterKolczynski-NOAA WalterKolczynski-NOAA changed the title Fix/cleanup Eliminate race conditions in cleanup Sep 5, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA changed the title Eliminate race conditions in cleanup Eliminate race conditions and remove DATAROOT last in cleanup Sep 5, 2024
Comment thread scripts/exglobal_cleanup.sh
aerorahul
aerorahul previously approved these changes Sep 5, 2024
Copy link
Copy Markdown
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
I found a suggestion online to use sync that ensures all modifications are synchronized.
It should help at the least, but look it up and validate it.

Copy link
Copy Markdown

@XuanliLi-NOAA XuanliLi-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. I had removed the files in my directories due to the disk quota limit, but Sean Casey at QOSAP tested the changes (including those suggested by @aerorahul) on Hera and confirmed that they solved the problem.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. label Sep 5, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Sep 5, 2024

CI Update on Wcoss2 at 09/05/24 08:40:17 PM
=================================================
PR:2893 Reset to Wcoss2-Ready by user and is now restarting CI tests
No current experiments to cancel in PR: 2893 on Wcoss2

@emcbot emcbot added CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 and removed CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. labels Sep 5, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Sep 5, 2024

CI Update on Wcoss2 at 09/05/24 08:40:21 PM
============================================
Cloning and Building global-workflow PR: 2893
with PID: 146574 on host: dlogin03

@emcbot emcbot added CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 labels Sep 5, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Sep 5, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Thu Sep  5 20:43:11 UTC 2024 on dlogin03
---------------------------------------------------
Build: Completed at 09/05/24 09:27:58 PM
Case setup: Completed for experiment C48_ATM_b502db38
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_b502db38
Case setup: Skipped for experiment C48_S2SWA_gefs_b502db38
Case setup: Completed for experiment C48_S2SW_b502db38
Case setup: Completed for experiment C96_atm3DVar_extended_b502db38
Case setup: Skipped for experiment C96_atm3DVar_b502db38
Case setup: Completed for experiment C96C48_hybatmaerosnowDA_b502db38
Case setup: Completed for experiment C96C48_hybatmDA_b502db38
Case setup: Completed for experiment C96C48_ufs_hybatmDA_b502db38

@emcbot emcbot added CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress labels Sep 6, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Sep 6, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_b502db38 *** SUCCESS *** at 09/05/24 10:57:10 PM
Experiment C48_S2SW_b502db38 *** SUCCESS *** at 09/05/24 11:00:17 PM
Experiment C96C48_hybatmDA_b502db38 *** SUCCESS *** at 09/06/24 12:06:41 AM
Experiment C96C48_hybatmaerosnowDA_b502db38 *** SUCCESS *** at 09/06/24 01:03:33 AM
Experiment C96C48_ufs_hybatmDA_b502db38 *** SUCCESS *** at 09/06/24 01:51:25 AM
Experiment C96_atm3DVar_extended_b502db38 *** SUCCESS *** at 09/06/24 10:42:49 AM

Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>
@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 6519211 into NOAA-EMC:develop Sep 6, 2024
DavidHuber-NOAA added a commit to DavidHuber-NOAA/global-workflow that referenced this pull request Sep 9, 2024
* origin/develop:
  Create JEDI class (NOAA-EMC#2805)
  Restructure the bufr sounding job    (NOAA-EMC#2853)
  Add an archive task to GEFS system to archive files locally (NOAA-EMC#2816)
  Reenable Orion Cycling Support (NOAA-EMC#2877)
  Eliminate race conditions and remove DATAROOT last in cleanup (NOAA-EMC#2893)
  Update aerosol climatology to 2013-2024 mean (NOAA-EMC#2888)
  Add ability to run CI test C96_atm3DVar.yaml to Gaea-C5 (NOAA-EMC#2885)
  Support global-workflow GEFS C48 on Google Cloud (NOAA-EMC#2861)
  Add 3 and 9 hr increment files to IC staging (NOAA-EMC#2876)
  Add diffusion/diag B for aerosol DA and some other needed changes (NOAA-EMC#2738)
  Correct ocean `MOM.res_#` stage copy (NOAA-EMC#2868)
  Support coupling on AWS (NOAA-EMC#2859)
  Add JEDI ATM lgetkf observer and solver jobs (NOAA-EMC#2833)
  Fix gdas build on Gaea and add Gaea to available CI list (NOAA-EMC#2857)
  Support ATM forecast only on Google (NOAA-EMC#2832)
  Add GEFS C48 support on AWS (NOAA-EMC#2818)
  Update omega calculation (NOAA-EMC#2751)
  Add snow DA update and recentering for the EnKF forecasts (NOAA-EMC#2690)
  support ATM forecast only on Azure (NOAA-EMC#2827)
  Convert staging job to python and yaml (NOAA-EMC#2651)
  Fixed test on UNAVAILBLE in python Rocoto check (NOAA-EMC#2842)
@DavidHuber-NOAA DavidHuber-NOAA deleted the fix/cleanup branch November 4, 2024 16:35
bbakernoaa pushed a commit to bbakernoaa/global-workflow that referenced this pull request Mar 19, 2026
…on/AQM.v7 and reverse the Rsnow value + allow hfreeze as a parameter for MOM_input NOAA-EMC#2920  (NOAA-EMC#2893)

* UFSWM - Add a variable to set the hfreeze value for MOM6 input.
  * AQM - Bring in additional changes from production/AQM.v7 and reversed the Rsnow value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gdascleanup and enkfgdascleanup failures

5 participants